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ABSTRACT 

We measure the two-point correlation function of G-dwarf stars within 1 — 3 kpc of the Sun in 
multiple lines-of-sight using the Schlesinger et al. G-dwarf sample from the SDSS SEGUE survey. 
The shapes of the correlation functions along individual SEGUE lines-of-sight depend sensitively 
on both the stellar-density gradients and the survey geometry. We fit smooth disk galaxy models 
to our SEGUE clustering measurements, and obtain strong constraints on the thin- and thick-disk 
components of the Milky Way. Specifically, we constrain the values of the thin- and thick-disk scale 
heights with 3% and 2% precision, respectively, and the values of the thin- and thick-disk scale lengths 
with 20% and 8% precision, respectively. Moreover, we find that a two-disk model is unable to fully 
explain our clustering measurements, which exhibit an excess of clustering at small scales (< 50 pc). 
This suggests the presence of small-scale substructure in the disk system of the Milky Way. 

Subject headings: Galaxy: disk - Galaxy: fundamental parameters - Galaxy: structure - methods: 
data analysis - methods: statistical - surveys 


1. INTRODUGTION 

The Milky Way provides a unique laboratory for study¬ 
ing the structure of a galaxy in detail, by allowing us 
to measure and analyze the properti es of large sam- 
ples of individual stars (see reviews by llvezic et aI1l2012l 
and iRix fc Bovvl I2013D . Recent surv eys, such as the 
Sloan Digital Sky Sur vey (SDSS I-HI: lYork et al.l [20001 
lEisenstein et al.ri2011h. the Two-M icron All Sky Sur- 
vev f2MASS: iSkrutskie et 'M]|2006 D, the Rad ial Velocity 
Experiment fR AVE: iKordonatis et al.ll2013[) . and oth¬ 
ers have placed strong constraints on the smooth com¬ 
ponents of the M ilky Way (e.g., iGarollo et al.l l2010t 
iBovv et ^l2012bl) . and have discovered significant spa¬ 
tial substruc ture in the Milky W ay, such as stellar 
streams fe. g..lBelo kurov e t al.ll20n6h and stellar overden¬ 
sities fe.g.. ljuric et al.ll2nn8[l . Investigating the structure 
of the Milky Way provides clues about galaxy formation 
and evolution that cannot be extracted from observations 
of distant galaxies. 

The Sloan Extensio n for Galactic Unde rstanding and 
Exploration (SEGUE; lYannv et al.l I2009H is a spectro- 
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scopic sub-survey of the SDSS that focused on Galactic 
science. SEGUE data provides the largest spectroscopic 
sample of Galactic stars currently available, and covers a 
more extensive volume of the Milky Way than previous 
studies, probing from the local disk all the way to the 
outer stellar halo. The full SEGUE survey provides an 
unprecedented opport unity to investigate the structure 
of the Milky Wav (e.g..lCarollo et aLll20ir)l: Ide Jong et al.l 
[Mnt ICheng'5n][2ni2lL 

The spatial two-point correlation function is one of the 
simplest and most effective statistical tools for studying 
clustering in general, and it is widely use d in studies of 
the large-scale structure of the Universe (|Peebleslll973t 
see lAnderson et al.l 120141 for a recent example). How¬ 
ever, it has rarely been used in Galactic structure stud¬ 
ies, mainly due to the lack of large and homogeneous 
spectroscopic stellar samples. There have only been a 
few applications of the correlation function applied to 
Galactic halo stars, especially giants and blue horizontal- 
branch (BHB) st a rs, bu t the sample sizes were limited. 
iDoinidis fc Beer^ (119891) analyzed over 4,400 BHB stars, 
and found an excess co r relatio n with separations r < 25 
pc. iStarkenburg et all (I2009D developed a phase-space 
correlation functio n, and applied it to 10 1 giants in the 
Spaghetti project (iMorrison et al.l l2000j) to search for 
substructures in the halo. The phase-space correlation 
function has also been applied to various BHB samples to 
quantify the amount of spatial and ki nematic substruc¬ 
ture i n the Milky Way’s stellar halo (iDe Propris et all 
I2010I iXue et al.M2011l : TCooper et al.ll2(Ill|) . In addition 
to these spatial studies, the angular two-point correla¬ 
tion functio n has been used to study the stellar cluster 
distribution (|LoDez-Corredoira et al.lll998D and to s earch 
for wide binaries fsee iLonghitano fc Binggelill2010l as an 
example). 

With the advent of large stellar samples provided by 
the SEGUE survey, it is time to explore Galactic struc- 
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Figure 1. Sky map of the 152 SEGUE fields used in this study, 
shown in a Mollweide projection in Galactic coordinates. Each 
point indicates the location of a single pencil-beam volume that 
is probed by a SEGUE spectroscopic plate covering 7 deg^ on the 
sky. 
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Figure 2. A selection of SEGUE pencil-beam fields in a slice 
perpendicular to the Galactic plane, including the Galactic center. 
Specifically, the slice shows fields with Galactic longitudes within 
ten degrees of 0° or 180° Galactic longitude. Each dot shows the 
location of a SEGUE G-dwarf, with red points indicating stars with 
distances between 1 — 3 kpc, which are used in this study. 


ture by applying the correlation function to stars. In this 
article, we measure the full 3-D spatial two-point correla¬ 
tion function of the SEGUE G-dwarf sample, which is the 
largest stellar category in the survey. In we describe 
the basics of the SEGUE survey and the G-dwarf sample 
we use. In ij3] we present our correlation function mea¬ 
surements and build intuition about its shape by investi¬ 
gating how it depends on the underlying stellar-density 
gradient and survey geometry. In 2] we fit a smooth 
Galactic model to our measurements and in iJS]we study 
residuals with respect to this model. Finally, we sum¬ 
marize our results and discuss possible future work in 


2. THE SEGUE G-DWARF SAMPLE 

The SE GUE survey makes use of the dedicated SDSS 
telesco pe (iGunn et al.l I2006D and multi-object spectro¬ 
graph (jSmee et al.l I2013IL SEGUE combines the ex¬ 


tensive and uniform photometry from the SDSS with 
medium-resolution {R ~ 1800) spectroscopy over a broad 
spectral range (3800 — 9200A) for ~ 240,000 stars span¬ 
ning a range of spectral types. SEGUE was designed 
to sample Galactic structure at a variety of distances in 
~ 200 pencil-beam’ volumes spread over the sky avail¬ 
able from Apache Point. Each pencil beam corresponds 
to a single SDSS spectroscopic plate covering a circu¬ 
lar region of 7 square degrees and probes a selection of 
stars i n that line-of-sight with up to 640 spectroscopic 
fibers (|Yannv et al.ll2009ll . Figured] displays the sky po¬ 
sitions of the pencil beams included in this study using 
a Mollweide projection in Galactic coordinates. Figured 
presents an edge-on view of the pencil beams with Galac¬ 
tic longitudes near the Galactic center and the Galactic 
anticenter. 

The G-dwarf sample represents SEGUE’s largest sin¬ 
gle homogeneous stellar spectral target category. The 
SEGUE G dwarfs are defined as having magnitudes and 
colors in the range 14.0 < tq < 20.2 and 0.48 < {g—r)o < 
0.55, where go and tq are the extinction-corrected g- and 
r-band magnitudes (the extinction correction uses the 
iSchlesrel et al.l[19^ dust map). This simple target se¬ 
lection makes the sel ection biases relativ ely straightfor¬ 
ward to understand (|Yannv et al.ll200^ . Here we use 
the G-dwarf catalog with d istances and weights derived 
by iSchlesinger et al.l (|2012h . Distances are estimated 
with an isochrone-matching technique that is accurate 
to ~ 8% for me tal-poor and ^ 18% for metal-rich stars 
(I An et al.ll200^ . 

We also apply the target-type weights and the 
r-magnitude we i ghts in the catalog described by 
ISchlesinger et al] (j2012fl to correct for various selection 
biases. SEGUE categories often focus on specific ranges 
in parameter space, and targets that fulfill multiple 
target-type criteria have multiple opportunities to be 
assigned a spectroscopic fiber. This approach leads to 
a slightly biased G-dwarf selection, which can be cor¬ 
rected for by the target-type weights. SEGUE assigns 
roughly the same number, ^ 300, of spectroscopic fibers 
to G-dwarf targets on each plug-plate, but this is far 
less than the actual number of available G-dwarfs, which 
also varies from held-to-field. As the stellar number den¬ 
sity changes over the SEGUE footprint, we must use r- 
magnitude weights to correct for this variable sampling, 
in order to better represent the true underlying stellar 
distribution in the Milky Way. For more details about 
the survey compl e teness and weights, we refer readers to 
ISchlesinger et al.1 (I2012L §4.7). Figure [3| presents the dis¬ 
tribution of G-dwarf stars with distance for a selection 
of nine SEGUE fields of varying Galactic latitude and 
longitude. The figure shows both the raw and weighted 
stellar distributions. Although the different lines-of-sight 
contain similar numbers of G-dwarf stars (as seen from 
the unweighted distributions), it is clear that there are 
large differences in the weighted distributions. Fields 
near the Galactic disk and the Galactic center have larger 
r-weights to account for the denser stellar distributions 
in those directions. 

To achieve a sufficiently high number density of stars 
throughout our sample volume, and to avoid unrealisti¬ 
cally large weights at the near and far ends of the pencil 
beams, we restrict the sample to stars with distances 
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Figure 3. Distribution of G-dwarf stars with distance, along a selection of nine SEGUE lines-of-sight. Each panel shows a particular 
SEGUE field, with panels arranged so that, going from top to bottom, fields point farther away from the Galactic plane in latitude, 
and, going from left to right, fields point farther away from the Galactic center in longitude. The Galactic coordinates of each field are 
listed at the top of each panel. The u nweighted distributions are shown in red, while the weighted distributions, which are corrected for 
incompleteness, are shown in blue (see lSchlesinger et al.l[2012l for more details on the weighting scheme employed). 


from 1 — 3 kpc, and ignore pencil beams containing less 
than 50 G dwarfs. These selection criteria produce a 
sample of 18,067 G dwarfs in 152 pencil beams that we 
use in our analysis. 


3. TWO-POINT CORRELATION FUNCTION 
MEASUREMENTS 

In galaxy surveys, a common method to estimate the 
correlation function of a given sample is to construct a 
denser and uniform random sample with the same survey 
geometry, and then, in each distance separation bin [r, r-l- 
Ar], count the number of pairs in both the survey data 
and the random sample. The correlation function can 


then be estimated by the so-called natural estimator. 




DD{r) 

RR{r) 


( 1 ) 


where DD are the weighted and normalized pair counts 
of objects found in each separation bin, and RR are 
the normalized pair counts of random points. The two 
terms are normalized by dividing by the square of the 
total number of data and random points, respectively. 
When estimating the correlation function of galaxies, it 
makes sense to use a uniformly distributed random sam¬ 
ple because the universe is intrinsically homogeneous and 
isotropic on large scales. However, this is not the case 
for stars within the Galaxy, which are distributed in disk 
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Figure 4. Dependence of the correlation function on the under¬ 
lying density gradient. The correlation functions are computed for 
mock star samples that all have the same pencil-beam geometry as 
one of our SEGUE lines-of-sight, but are designed to have differ¬ 
ent power-law stellar density profiles, as listed in the panel. Each 
curve is the average over 1000 mock samples containing 1000 stars 
each; the error bars show the uncertainty in the mean as estimated 
from the standard deviation among the 1000 mocks. The correla¬ 
tion function has a complex shape, and is highly sensitive to the 
stellar-density gradient, especially on small scales. 

and halo structures that exhibit strong global density 
gradients. 

If we know the global spatial-density distribution of 
stars in the Galaxy, we can construct a substitute for the 
random sample that instead follows the same global dis¬ 
tribution as the stars. The measured correlation function 
will then mostly cancel on all scales, and reveal what¬ 
ever excess clustering remains. If we do not fully know 
the underlying density distribution of stars, we can still 
compare the observed data to a uniform random sample, 
but then the measured correlation function will have a 
shape that encodes this information. The pencil-beam 
survey geometry can also add complications. The inter¬ 
play between the survey geometry and the non-uniform 
density distribution of stars can create additional signals 
in the correlation function. 

Before computing the correlation function of the 
SEGUE stars, we first investigate how stellar-density gra¬ 
dients and the pencil-beam survey geometry can affect 
the shape of the correlation function in general, by cre¬ 
ating different mock star samples and measuring their 
correlation function. Eirst, we set the mock survey geom¬ 
etry to be the same as that in one of our SEGUE lines-of- 
sight, i.e., a pencil beam with an angular diameter of 3° 
and distances between 1 — 3 kpc. We generate mock star 
samples within this geometry, each containing 1000 mock 
stars, using different power-law density profiles. Specif¬ 
ically, the density gradients we adopt are n ^ dr'^, d~^, 
d~^, d~^, d^, and where n is the number density of 
stars and d is the distance from the observer. We then 
construct a uniformly distributed random sample with 
ten times the number density, and calculate the corre¬ 
lation function using equation [T] for each density profile. 
Finally, we repeat these steps 1000 times, using indepen¬ 


Figure 5. Dependence of the correlation function on survey ge¬ 
ometry. The correlation functions are computed for mock star sam¬ 
ples that all have the same stellar-density profile of n ~ d~^, but 
occupy different sample geometries. All sample geometries range 
in distance from 1 — 3 kpc, but their angular extent on the sky 
varies from a circle of radius ^radius = 1-5°, all the way up to the 
full sky as listed in the panel. As in Fig. |4l points and errors are 
estimated from 1000 mock samples. The correlation function is 
sensitive to the survey geometry, featuring a minimum at a scale 
approximately equal to the diameter of the pencil-beam volume at 
its halfway point along the line-of-sight. 

dent realizations of the mock samples, and average the 
results to reduce the noise. Note that these mock samples 
with power-law density gradients are not meant to rep¬ 
resent realistic Galactic models, but serve the purpose of 
building intuition on how density gradients can affect the 
derived correlation function. We study realistic Galactic 
models in § U) 

Figure |4] shows the resulting correlation functions for 
our adopted density gradients. The overall shape of the 
correlation function is quite complex, and is very sensi¬ 
tive to the density profile. On small scales (< 50 pc) 
the correlation function is always boosted, regardless of 
whether the underlying density gradient is positive or 
negative. However, on larger scales 50 — 500 pc), the 
clustering is depressed or boosted depending on whether 
the density gradient is negative or positive, respectively. 
Finally, on even larger scales (> 500 pc) the sign of 
this dependence flips. There is an interesting feature at 
r ~ 0.1 kpc, where the correlation function is a minimum 
or maximum for negative and positive density gradients, 
respectively. This is approximately equal to the diame¬ 
ter of the pencil beam volume at its halfway point along 
the line-of-sight. 

Next, we investigate how the survey geometry can af¬ 
fect the correlation function. We set the underlying den¬ 
sity profile of our mock star samples to be n c?“^, 
and vary the sample geometry. The geometries we test 
all range in distance from 1 — 3 kpc, as in our SEGUE 
sample, but their angular size on the sky varies from the 
pencil beam of radius ^radius = 1-5° (as in SEGUE), to 
larger beams of radius 3°, 6°, 12°, as well as a full-sky ge¬ 
ometry. As before, we generate 1000 independent mock 
samples for each geometry, and calculate their average 
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Figure 6. The two-point correlation functions of SEGUE G- 
dwarf stars. Each gray line is measured from one of 152 individual 
SEGUE lines of sight. The shapes of the correlation functions are 
similar to those for the negative gradients shown in Fig. [4| 


correlation function using a nniform and dense random 
sample. Figure O reveals a fairly simple dependence of 
the correlation function on survey geometry. On scales 
that are much smaller than the width of the pencil beam, 
the correlation function is unaffected by the survey ge¬ 
ometry, as might be expected. However, the feature in 
the correlation function that occurs at 0.1 kpc for the 
SEGUE geometry shifts to progressively larger scales as 
the width of the pencil beam grows. In fact, the scale of 
the feature is always approximately equal to the diame¬ 
ter of the pencil-beam volume at its halfway point along 
the line-of-sight. 

These tests demonstrate that the correlation function 
of stars will depend sensitively on both the underlying 
density gradients and the survey geometry. The result¬ 
ing correlation function has a peculiar shape that is quite 
different from the power-law shape we are accustomed 
to seeing for galaxy surveys. The strong dependence on 
the underlying density gradients suggests that the cor¬ 
relation function of stars could have strong constraining 
power on Galactic structure models. This is especially 
true at the smallest scales and when density gradients are 
steep, since this is where the correlation function is most 
sensitive to variations in the underlying density distribu¬ 
tion. The explanation for this is fairly straightforward. 
At the smallest scales we probe (< lOpc), the mean sep¬ 
aration between stars is much larger, and so there would 
not be many pairs if the stars were randomly distributed. 
If, however, there is a steep enough density gradient, the 
stars are redistributed so that they become sufficiently 
dense at either the near or far end of the survey vol¬ 
ume (depending on whether the gradient is negative or 
positive), thus leading to several small-scale pairs. 

To measure the correlation function of SEGUE G- 
dwarf stars, we first construct a random sample with the 
same pencil-beam geometry as our sample, and contain¬ 
ing uniformly distributed points with 100 times higher 
number density than the SEGUE data. We then calcu¬ 


late the correlation function of each SEGUE line-of-sight 
independently, i.e., we only count pairs of stars that re¬ 
side in the same SEGUE field. Figure |6] shows the result, 
in which each gray line is the correlation function of an 
individual SEGUE pencil beam. The measured correla¬ 
tion functions have the same peculiar shape seen in the 
mock tests in Figure 31 In particular, they are similar 
to the cases of negative density gradients, which makes 
perfect sense, since all SEGUE lines of sight move out of 
the Galactic disk. 

The distances to the SEGUE stars are not known per¬ 
fectly, but rather contain, on average, 12% uncertainties. 
It is thus important to determine how much these errors 
can affect the correlation function measurements. We 
test this issue by adding 12% Gaussian-distributed dis¬ 
tance errors to our mock samples, and then recalculating 
the correlation functions. These tests demonstrate that 
12% distance uncertainties have a negligible effect on the 
correlation function. 


4 . FITTING A SMOOTH GALACTIC MODEL 


Since the two-point correlation function of G dwarfs is 
highly sensitive to stellar-density gradients, it can serve 
as a tool to probe the smooth density structure of the 
Milky Way. We approach this by replacing the uniform 
random sample in equation [T] with a mock sample gener¬ 
ated from a Milky Way model. 


r(r) 


DDjr) 

MM{r) ’ 


( 2 ) 


where MM are the normalized pair counts from our 
Milky Way model. If the model we choose truly rep¬ 
resents the underlying stellar distribution and has the 
same geometry as the data, then ^'{r) should cancel on 
all scales and along all lines-of-sight. By searching the 
parameter space of a given model, we can thus place 
constraints on the model parameters, and determine to 
what extent the model can explain the observed stellar 
clustering. 

As a proof of concept, we choose a standard thin- -|- 
thick-disk model with two exponential disk components 
and five parameters. 


n{R,Z) oc sech^ ( ——) exp ^ ) 

V^^O.thin/ V rCo.thin/ . . 

+a sech^ \ exp (- ^ \ , 

\4zSq thick/ \ riQ,thick/ 

where Zoyhin, ^o,thick, I?o,thin, .Ro,thick are scale heights 
and scale lengths of the thin disk and the thick disk, re¬ 
spectively, and the fifth parameter is the ratio of the 
normalization factors of the thick and the thin disk, 
a = n n thick /nn thin. In a recent study, I Bow et al.l 
(l20I2all reported that when one separates disk popula¬ 
tions by their chemical signatures, there is a continuous 
range of disk thicknesses, and there is no distinct thick 
disk component. Since we do not apply any additional 
color or metallicity cuts in the sample, for simplicity we 
stick to the traditional bi-modal disk model. Our model 
does not include a bulge or halo component because, in 
the restricted range of distances we probe (1 — 3 kpc), 
these components should contribute a negligible number 
of stars to our sample. 
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Figure 7. The distribution of scale heights for the thin and the 
thick disk from the MCMC chain. The main panel shows 1-, 2- 
, and 3(J likelihood contours for the joint probability distribution 
of both scale heights, while the smaller panels on top and to the 
right show the individual probability distributions of each scale 
height, marginalized over all other parameters. The 1-a statistical 
precision of these constraints is 3% and 2% for the thin- and thick- 
disk scale heights, respectively. 


We employ a Markov-chain Monte Carlo (MCMC) 
method to identify the region in parameter space where 
^'{r) is consistent with zero, i.e., to find the parame¬ 
ters that best fit the SEGUE clustering data. At every 
MCMC step, we need to have a mock catalog from our 
model that is generated from a given set of parameter 
values and has the same SEGUE survey geometry (all 
lines-of-sight). Moreover, the mock catalog should be 
substantially denser than the SEGUE data, so that the 
errors in MM are much smaller than the errors in DD. 
Generating new dense mock samples and finding pairs 
at each step of the chain can be computationally expen¬ 
sive. Instead, we adopt a strategy that is both accurate 
and more efficient. We first generate a single dense and 
uniformly distributed random sample with the SEGUE 
geometry (all lines-of-sight) and identify all the pairs of 
points in bins of separation. At each step in the chain 
we assign a new weight, Wi, to each random point ac¬ 
cording to equation [3] We then calculate MM{r) by 
summing the product WiWj over all pairs with separation 
r. Finally, we normalize MM hy the sum of WiWj over 
all pairs and all scales. When normalizing, the absolute 
normalization of n{R, Z) cancels and is thus irrelevant. 

In each of our 152 pencil-beam volumes, we calculate 
^'{r) in 12 logarithmic bins ranging from 5 pc to 2 kpc. 
Excluding any bins that have zero pair counts in DD, 
we have 1,777 individual measurements of We es¬ 

timate the total using 


X 


2 


E 




(4) 


which sums over all scales and all pencil beams. We use 
jackknife resampling to estimate the uncertainties of pair 


Figure 8. The distribution of scale lengths of the thin and the 
thick disk from the MCMC chain. All features are similar to those 
in Fig. [3 The l-cr statistical precision of these constraints is 20% 
and 8% for the thin- and thick-disk scale lengths, respectively. 


counting in both the data and the model. The final un¬ 
certainty, cr^(r), is a combination of the uncertainty in 
the data and the uncertainty in our model, although the 
pair counting in our model always has much smaller un¬ 
certainties than the data because it has a much higher 
number density. We treat all ^((r) as independent mea¬ 
surements and ignore the covariances. We will investi¬ 
gate the covariances in a future study. 

Figures [7] and m show the scale-height and scale-length 
distributions from our MCMC chains. After marginaliz¬ 
ing over all other parameters, we obtain a thin-disk scale 
height of 233 ± 7 pc and scale length of 2.34 ± 0.48 kpc, 
and a thick-disk scale height of 674 ± 16 pc and scale 
length of 2.51 ± 0.19 kpc. While these numbers are 
in the same broad range as other rec ent measurements 
using SEGUE or S PSS data (e.g . lJuric et al.l 1^0081 
Carollo^^_ab I l201Ct Ide Jong et al.i j2010l: iBensbv et al l 

201lUCheng et al.ll2012l : lBovv et al.ll2012bll . they are not 
in statistical agreement with most of these studies. Un¬ 
fortunately, it is difficult to directly compare our results 
to other studies because the star samples differ signifi¬ 
cantly in most cases (i.e., different types of stars or differ¬ 
ent metallicity or color cuts). For example, our thick-disk 
scale height and length a r e sign ificantly lower than those 
measured by l.luric et Ml ()2008ll , but that study used M 
stars from the SDSS. Our thick-disk scale length is sig¬ 
nifican tly higher than the one measured by iGheng et al.l 
(|2012ll . who also used SEGUE data, but that study fo¬ 
cused on a-enhanced stars. Our thin-disk scale height 
and length a re som ewhat smaller than those measured by 
iBovv et al.l (j2012bf l . but in that study ’thin’ and ’thick’ 
disks refer to single disk fits to either a-young or a-old 
G-dwarf subsamples, respectively. It would be interest¬ 
ing to repeat our measurements on different subsets of 
the data, so that we may better compare our constraints 
to other investigations. 

We check the accuracy with which our fitting method- 
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ology can recover disk parameters by creating a mock 
SEGUE sample from equation|3l and then analyzing it in 
the same way as we have analyzed the SEGUE G-dwarf 
sample. Our modeling methodology successfully recov¬ 
ers the correct thin- and thick-disk parameters within 
the Icr error bars. This exercise demonstrates that our 
Milky Way constraints do not contain systematic errors 
due to the methodology. However, there may be system¬ 
atic errors in our constraints that arise from errors in the 
SEGUE weights we use. Although we do not expect these 
errors to be large given the fairly homogeneous nature of 
the SEGUE G-dwarf sample in the narrow distance range 
that we study, we cannot guarantee that these systematic 
errors are smaller than our statistical errors. The main 
point to emphasize is that the high statistical precision 
of our measurements (2-3% for the scale heights and 8- 
20% for the scale lengths) proves the constraining power 
of the correlation function statistic for Galactic studies. 
We note that our statistical p recision is still cons iderably 
lower than that reported by iBovv et'all (l2012bll , which 
is three to five times higher. This is most likely due 
to the fact that we measure the correlation function of 
each SEGUE line-of-sight separately, which means that 
the overall variation in stellar density from one sightline 
to another does not contribute to our model constraints. 
We can improve on this by measuring a single correlation 
function that includes cross-sightline pairs, and we leave 
this to a future study. 

We also investigated how well the two-disk model in 
Equation [3] explains the measured clustering of SEGUE 
G dwarfs. The value for our best-fit model is 2,853 
for 1,772 degrees of freedom, suggesting that the model 
is strongly ruled out. For comparison, we tried a single 
exponential disk model with only two parameters. The 
best-fit value of in that case is 4,384 for 1,775 de¬ 
grees of freedom. The two-disk model is thus strongly 
preferred over the single-disk model. However, even the 
two-disk model is excluded by our correlation function 
measurements. 

5. EVIDENGE OF SUBSTRUGTURE? 

We next investigate the residual clustering of SEGUE 
stars relative to our best-fit two-disk model to see where 
the model fails. Figure [3] shows ^'(r) for the best-fit 
model along all the lines-of-sight (gray lines), as well as 
the mean residuals averaged over all lines-of-sight (red 
points). It is clear that, although our best-fit model can¬ 
cels the correlation function on most scales, there re¬ 
mains significant excess clustering in the SEGUE data 
on small scales (< 50 pc) that cannot be explained by 
the model. This discrepancy could be due to a num¬ 
ber of reasons. It is possible that a smooth model of 
the density structure of the Milky Way can in fact fully 
account for our clustering measurements, but we have 
just adopted the wrong model. The “correct” model 
could be a two-disk model with a different functional 
form than Equation [3l Alternatively, we may be miss¬ 
ing one or more components, such as a third disk or, 
more likely, a smooth seque nce of disks for sta rs of dif¬ 
ferent ages, as suggested bv IBovv et al.l (|2012bl) . Subtle 
changes to the smooth density model can cause strong 
deviations in the correlation function, as demonstrated 
in Figure 21 Gonversely, the excess clustering that we 
find could be evidence of substructure in the SEGUE 



Figure 9. Correlation function residuals of SEGUE stars relative 
to the best-fit two-disk model. Each gray line shows the residual 
pair counts for one SEGUE line-of-sight. The red points show the 
mean residual, and error bars show the uncertainty in the mean es¬ 
timated from the dispersion among the lines-of-sight. The SEGUE 
data clearly shows an excess clustering at small scales (< 50 pc), 
suggesting possible substructures that are not included in our sim¬ 
ple two-disk model. 



Figure 10. Sky map of values for the best-fit two-disk model. 
The color of each SEGUE field indicates the contribution to the 
global x^ coming from that particular line-of-sight. The map re¬ 
veals no obvious correlation between the goodness of fit and posi¬ 
tions on the sky, indicating that the excess signal in the correlation 
function is probably not caused by field-dependent structures. 

data that cannot be explained by any smooth density 
model. For example, this signal could be due to some 
stars living in clusters, or could be due to the presence 
of large localized structures such as stellar streams. If 
the excess clustering is produced by localized structures 
on the sky, we would expect that those specific SEGUE 
lines-of-sight are solely responsible for the failure of our 
two-disk model to fit the data. We investigate this pos¬ 
sibility in Figure [ini which displays the map of values 
contributed by each SEGUE field across the sky. The 
map does not reveal any significant spatial structure in 
the distribution, suggesting that the remaining signal 
is probably not caused by large localized structures such 
as stellar streams. There is one specific SEGUE field 
that has an abnormally high value of : the red field at 
a Galactic latitude of ~ —65° in Figure [TUI However, re¬ 
moving this line-of-sight does not resolve the discrepancy 
between data and model. 























6. SUMMARY AND DISCUSSION 

In this paper we explore applying a traditional clus¬ 
tering statistic, the spatial two-point correlation func¬ 
tion, to stars in the Milky Way as a probe of Galactic 
structure. Tests with mock samples have shown that the 
shape of the correlation function is sensitive to both the 
stellar-density gradients in the Galaxy disk and the sur¬ 
vey geometry. We have measured the correlation func¬ 
tion of SDSS SEGUE G-dwarf stars, which is a large and 
homogenous sample with well-understood selection cri¬ 
teria, geometry, and distance errors. By comparing our 
measurements to a two-disk Galactic model, our mea¬ 
sured correlation functions yield tight constraints on the 
structure of the thin and thick disk of the Milky Way. 
Specifically, the thin- and thick-disk scale heights are de¬ 
termined with a precision of 3% and 2%, respectively, 
while the thin- and thick-disk scale lengths are deter¬ 
mined with a precision of 20% and 8%, respectively. 
This high precision is achieved with spatial information 
alone, and it proves the strong constraining power of the 
correlation function. Furthermore, we have studied the 
residuals of the SEGUE clustering relative to our best- 
fit two-disk model, and have found a small but signif¬ 
icant excess of clustering on scales less than 50 pc in 
the SEGUE data relative to the smooth model. This 
clustering may be due to imperfections in the smooth 
model or it may be due to the presence of substructure 
in the SEGUE data that cannot be described by a smooth 
model. The main source of systematic error in this analy- 
sis comes from uncerta inties in the weights (calculated by 
iSchlesinger et al]l20I2f) that we use to account for sample 
incompleteness. Although we do not expect these uncer¬ 
tainties to be large, further work is needed to assess the 
extent to which they affect our model constraints. 

There are several avenues for future work. First, the 
methodology we have used can be explored further and 
improved. For example, we can study the covariances 
between different data points and include them in the 
analysis. We can also probe larger scales by measuring 
pairs across neighboring lines-of-sight, instead of stick¬ 
ing to within one SEGUE field at a time. This should 
significantly improve the constraining power of the cor¬ 
relation function and it may detect the signatures of 
large structures such as stellar streams. Secondly, we 
can study subsamples of SEGUE stars, such as samples 
in specific metallicity ranges, in order to better compare 
our constraints against other works. We can also ex¬ 
plore variants of the spatial correlation function, such as 
a metallicity- or age-weighted correlation function or a 
phase-space correlation function. Finally, we can further 
explore the cause of the discrepancy between the cluster¬ 
ing of SEGUE stars and the two-disk model by exploring 
a larger family of smooth Galactic models. 
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